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Human behaviors are often driven by human interests. Despite intense recent efforts in exploring the 
dynamics of human behaviors, Uttle is known about human-interest dynamics, partly due to the extreme 
difficulty in accessing the human mind from observations. However, the availability of large-scale data, such 
as those from e-commerce and smart-phone communications, makes it possible to probe into and quantifj^ 
the dynamics of human interest. Using three prototypical "Big Data" sets, we investigate the scaling 
behaviors associated with human-interest dynamics. In particular, from the data sets we uncover fat-tailed 
(possibly power-law) distributions associated with the three basic quantities: (1) the length of continuous 
interest, (2) the return time of visiting certain interest, and (3) interest ranking and transition. We argue that 
there are three basic ingredients underlying human-interest dynamics: preferential return to previously 
visited interests, inertial effect, and exploration of new interests. We develop a biased random-walk model, 
incorporating the three ingredients, to account for the observed fat-tailed distributions. Our study 
represents the first attempt to understand the dynamical processes underlying human interest, which has 
significant applications in science and engineering, commerce, as well as defense, in terms of specific tasks 
such as recommendation and human-behavior prediction. 



A fundamental feature of a human society is that its individuals possess all kinds of interests, the driving 
force of many human behaviors. Some interests may last for a lifetime while others can fade away in short 
time. From time to time our interests also change. In the modern society that we live in, all kinds of 
attractions and temptations emerge and disappear on a daily basis. Does this mean that the evolution of our 
interest is mostly random? Or are there intrinsic dynamical rules that govern how human interests evolve with 
time? To answer these questions was deemed to be extremely difficult, due to the lack of appropriate means to 
characterize human mind and to measure quantitatively how it changes with time. Yet the questions are fun- 
damental in science, and any revelation of the dynamics of human interest may have significant applications in 
commerce, medical sciences, and even defense. In particular, in commerce, adequate knowledge of customer 
interests and how they change with time are key to the success of many businesses as such knowledge can be of 
tremendous value to advertisement design and product promotion. In psychiatry, a good understanding of 
patients' interests may help generate accurate diagnosis and devise effective therapeutic approaches. In defense, 
timely and reliable assessment of certain group or individuals' interests and their time evolution can help predict 
the group or individuals' possible future behaviors and actions. Apparently, all these rely on human- interest 
dynamics' being not completely random. 

There have been efforts in modeling and understanding human behaviors that are essential to many social and 
economical phenomena, with significant applications in areas ranging from resource allocation and transporta- 
tion control to epidemic prediction and personal recommendation^"^. The pursuit has been facilitated greatly by 
the advances in information technology, especially by the availability of massive Internet data and resources^. 
However, to probe into human-interest dynamics is more challenging, due to the difficulty in characterizing 
human interests and traditional lack of data sets from which the underlying dynamical processes maybe deduced. 
In recent years "Big Data" sets, such as those from e-commerce or mobile-phone communications, become 
commonly available, making it possible to quantif)^ human interests and to infer their intrinsic dynamics. As a 
branch of the science of "Big Data", the field of human-interest dynamics is at its infancy. 

A viable approach to probing into human-interest dynamics is to use data analysis as a getaway to uncover 
various phenomena and possible scaling laws. Guided by this principle, in this paper we explore two e-commerce 
data sets (Douban, Taobao) and one communication data set [Mobile-Phone Reading (MPR)] , and focus on three 
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issues: statistical distribution of the time that an interest lasts, distri- 
bution of the return time to revisiting a particular interest, and inter- 
est ranking and transition. Considering the large number of factors 
that can affect human interest, such as the specific activity contents 
and distractions of the individual's attention, it seems plausible that 
the underlying dynamics be completely random^"^. Indeed, a widely 
used assumption is that of the Markovian type of dynamics for indi- 
viduals' online behaviors, in which an online user's next action 
depends not on his/her history of interests but on the current interest 
only^"^\ However, there is recent evidence^^'^^ of deviations from the 
Markovian dynamics. Our systematic analysis of the three data sets 
reveals an unequivocal signature of the fat-tailed scaling behavior 
characteristic of non- equilibrium complex systems and, conse- 
quently, indicates the existence of intrinsic dynamical rules govern- 
ing the human-interest dynamics. Based on the empirical analysis, 
we identify three basic ingredients underlying the dynamics: pref- 
erential return, inertial effect and exploration. A mathematical model 
incorporating these ingredients is then developed to account for the 
observed fat-tailed scaling behaviors. Our study represents the first 
systematic attempt to probe into the dynamics of human interest, 
and we expect our finding and model to have broad applications. 

We note that, in the study of human behaviors, heavy- tailed type 
of statistical features, e.g., those in the inter-event time distribu- 
tions^^"^^, have been uncovered recently. Such a non-Poisson type 
of distribution implies, e.g., that the bursts of rapidly occurring 
events are typically separated by long periods of inactivity. Various 
mechanisms have been proposed to explain the heavy-tailed inter- 
event statistics, such as the highest-priority-first queue modeP^'^^, 
Poisson probability modeP^'^^, varying interest^^, memory effects^^, 
and human interactions^^'^^'^^. Non-Poisson, heavy-tailed type of 
statistics also arise in human mobility trajectories^^"^^, and math- 
ematical models have been proposed to account for the non- 
Markovian type of dynamics underlying the human mobility, such 
as those based on exploration and preferential return^", hierarchy of 
traffic systems^\ and regular mobility^^. Variances in the statistical 



behaviors of human mobility were also reported^^ The distinct 
feature of our work is its focus on human-interest dynamics. 

Results 

We analyze three massive data sets: two from e-commerce, namely, 
Douban and TaohaOy and one from mobile -communication, i.e., 
MPR. We focus on the scaling of three quantities: (1) the time interval 
/ that an individual stays within the same interest, defined as the 
length of a sequence of clicks within the same interest category 
(defined in Methods), (2) the time interval t that an individual 
returns to visiting the same interest category, defined as the sequence 
of clicks between two visits to the same interest, representing a kind 
of memory effect in the dynamics of interest, and (3) the frequencies 
of visit of an individual to different interests, which can be used to 
rank this individual's particular interests. 

Fat-tailed distribution of interest interval 1. A number of 
approaches have been proposed to characterize an individual's 
interests, such as the interest profile^^, contextual information^^, 
distinct visited subpages^^, and service items^^. Taking advantage of 
the nature of our large data sets, we use categories to characterize an 
individual's interests, which can be, for example, music, books and 
movies on Douban, clothing, footwear, and toys in Taobao, love 
stories and science fictions on MPR, and so on. Figure 1(a) shows, 
for a typical individual on Douban, the distribution P(l) of / visiting 
different interest categories, which exhibits a fat-tailed distribution: 
P{1) ~ The long tail associated with the scaling indicates that the 
individual tends to spend an abnormally long time visiting certain 
interests during browsing. Similar scaling behaviors have been found 
for users on Taobao and MPR, as shown in Figs. 1(b) and 1(c), 
respectively. A typical sequence that the values of / corresponding 
to an identical interest appear is shown in Fig. 1(d). From Fig. 1(d), 
we observe a highly non-uniform behavior in the values of /, which 
gives rise to the fat-tailed distribution in Fig. 1(a). We have examined 
many individuals from the three data sets, and found similar 




Figure 1 | Distribution of interest-dwelling time, (a-c) Probability distributions P(/) of the time interval / of consecutive visits to the same interest for 
three representative individuals, each from one of the three data sets {Douban, Taobao, and MPR), where the numbers of interests are 3, 24, and 44, 
respectively. The numbers of clicks (No) for the three cases are 18396, 106571, and 4398, respectively. The three distributions can be fitted as P(/) ~ l'"", 
with exponents a ~ 1.16, 4.02 and 3.35, respectively (the values of the exponent a are estimated using the maximum-likelihood criterion^^). Panel (d) 
shows the various values of / as they appear with time, where n is the event index (an integer variable). 
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behaviors. In fact, the distribution of / for all users from any 
particular data set exhibits a robust fat- tailed distribution (Fig. SI 
in Supplementary Information). The scaling observed for all cases 
implies substantial derivation of the human-interest dynamics from 
that of the Markovian process (associated with the transition 
probability matrix for interests) for which the scaling of / would be 
exponential^^. 

Memory effect in human-interest dynamics. Memory, as one of the 
key attributes of human being, has been widely studied in the 
p^g^23,24,35,4i-44 observe from our data sets that, often, an 
individual tends to return to specific interests that he/she has 
recently visited with relatively higher probabilities than those 
visited long ago. For example, even when an interest had been 
visited many times in the past, if the most recent visit dates back 
one year or longer, the probability of revisiting is lower as compared 
with that associated with another interest that was visited merely a 
week ago. But would the probability that an interest is revisited after a 
very long time be exponentially small? To answer this question, we 
calculate the distribution of the return time^^ t, the time interval that 
an individual revisits the same interest after the last visit. Typical 
distributions from three individuals, one from each data base, are 
shown in Figs. 2(a-c), which can again be well fitted by fat-tailed 
distributions: P(t) ~ t"^, with the exponent p. While P(t) is higher 
for small values of t, the probability of the occurrence of very large 
values of t is, surprisingly, not exponentially small, indicating that 
such events can indeed occur. An important implication is that, both 
short-term and long-term memories can shape the human -interest 
dynamics. Similar results are obtained for many other users (Fig. S5 
in Supplementary Information). Additionally, the distribution of t 
for all users from any particular data set exhibits a fat-tailed 
distribution (Fig. SI in Supplementary Information). 

Interest ranking and transition among interests. An individual can 
possess a number of interests, which can be ranked in terms of the 
respective frequencies of visit. In a given (large) time interval, an 
individual can focus on different interests, giving rise to a kind of 
"transition" among the interests. The interest ranking and transition 
are important not only for the study of human dynamics and 
decision-making^^'^^, but also for applications such as behavior 
prediction and search-algorithm design. 

A convenient way to assess the interest-transition pattern for an 
individual is to use a network representation, where nodes denote 
different interests with sizes determined by their ranks, links corre- 
spond to the observed transitions among the interests, and the dwell- 
ing time in any particular interest is represented by a self loop. 
Similar network representations have also been used in other con- 
texts such as transportation dynamics^^, citations^^, and human- 
mobility behaviors^^. Figures 3(a-c) show examples of the transition 
networks of one typical individual from each of the three data sets. 



respectively. Setting the most frequently visited interest to have rank 
r = 1 and the successively less frequently visited interests to have 
ranks r = 2, 3, and so on, we can generate a distribution of the interest 
rank for each individual, examples of which are shown in Figs. 3(d-f). 
In all cases, such a rank distribution can be approximately fitted by 
the following exponentially truncated fat-tailed distribution:/^ = r~'^ 
exp ( — r/S), where S is the number of distinct interests that the 
individual has selected. Note that this truncated fat-tailed distri- 
bution is with respect to an individual. When the collective behavior 
of a large number of individuals is considered, the signature of the 
exponential truncation diminishes and the scaling of/, can be better 
fitted by a fat-tailed distribution (see Fig. SI in Supplementary 
Information). This is similar to the fat-tailed ranking distribution 
observed in the collective human-mobility patterns^^'^°'^° where the 
distribution is with respect to the actual locations that the individual 
visits physically. 

Model of human-interest dynamics. To gain insights into the 
development of a quantitative model describing the dynamics of 
human interest, we study the transition pattern of any individual 
among interests, which can be characterized by the probability for 
transitional events to take place between interests i and j, defined as 

p{i,j) = n{i,j) Z^-^- where n(i,j) is the number of switchings 

from interest / to j. Examples of the transition probabilities, those 
corresponding to the respective transition networks in Figs. 3(a-c), 
are shown in Figs. 3(g-i) in the two-dimensional representation of / 
and;. We observe two key features: (i) p{Uj) exhibits relatively large 
values for transitions among the highly ranked interests (note that r 
= 1 corresponds to the highest ranked interest), and (ii) the diagonal 
elements p(i, i) have relatively large values as well. The first feature 
suggests a kind of preferential selection^^'^°'^^"^^ of interests: indi- 
viduals tend to return to highly ranked interests with relatively 
larger probabilities and stay in these interests. The second feature 
indicates an inertial effect: an individual tends to stay in the interest 
that he/she has already been exploring. These two ingredients, 
preferential return and inertia, plus an individual's desire to 
explore new interest, are the basic ingredients underlying the 
human-interest dynamics, based on which a phenomenological 
model can be developed. 

A schematic illustration of our model is shown in Fig. 4(a). To 
initiate the dynamical evolution of interest, an individual has two 
options: exploration of new interest or return to one of the previously 
visited interests, with probability pn~^ and 1 — pn~^\ respectively, 
where 0 < p < 1 and >^ > 0 are parameters^"'^^, and n denotes the 
number of hopping- events among different interests, which is 
obtained by merging the same interest in click- event series into 
one. For example, the click-event series 1, 1, 2, 2, 2, 1, 3 with 7 actions 
can be transformed into the following hopping-event series: 1, 2, 1, 3, 
where n = 4.ln the exploration state, individual visits a new interest 
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Figure 2 | Memory effect of human interest dynamics, (a-c) For the data sets in Figs. 1 (a-c) , respectively, fat-tailed distributions (t ^) of the time t taken 
to revisit the same interest. The values of the fitted exponent P are approximately 1.58, 2.04, and 1.41 for (a-c), respectively. 
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Figure 3 | Interest- transition network and transition probabilities, (a-c) For the three individuals represented in Figs, l(a-c), the respective transition 
networks, where nodes correspond to distinct interests, a self loop represents the dwelling time in the same interest category, and the weighted links 
characterize the interest transitions. A few highly frequently visited interests are marked, (d-f) Truncated fat-tailed in the rank distribution: fr oc r"'' exp 
( — r/ 5), where the fitted values of the exponent y and the numbers of interests are (y, S) = (0.89,24) (panel (e), Taohao) and (y, S) = (1.39,44) (panel (f), 
MPR) (The dashed line in Fig. 3(d) is for eye guide), (g-i) Two-dimensional representation of the interest-transition probabilities for the three networks 
in (a-c), respectively. The probabilities are represented on a logarithmic scale; see side bars. 



and continuously browses the same interest, due to the effect of 
inertia. At a "microscopic" level, inertial browsing can be regarded 
as an excited random-walk process (ERW)^^. If the individual returns 
to a set of previous revisited interests, he/she preferentially selects an 
interest category to browse according to the prior probability of visit 
to the same interest. Once a particular interest is chosen, the inertial 
effect sets in and the individual has the tendency to stay in the same 
interest category. The microscopic browsing behavior again can be 
modeled by an excited random-walk process. A detailed mathemat- 
ical analysis of the model in Fig. 4(a) can be found in Supplementary 
Information. Examples of the predicted scaling relations are illu- 
strated in Figs. 4(b-d) (with more examples in Supplementary 
Information), which are consistent with those uncovered from real 
data as exemplified in Figs. 1-3. 

Discussion 

Despite recent efforts in human-mobility dynamics little is 
known about human- interest dynamics. We aim to explore the fun- 
damental mechanisms underpinning the human-interest dynamics 
through a completely data-driven approach. In particular, we have 
analyzed three large-scale data sets: two from e- commerce and one 



from mobile communication, and uncovered the emergence of fat- 
tailed behaviors in a number of fundamental quantities. These are the 
interval / to stay in an interest, the time interval t to return to a 
previously visited interest, and the interest-ranking distribution. A 
detailed analysis of the patterns of the transition probabilities among 
different interests suggests preferential return, inertia, and explora- 
tion as the three basic dynamical ingredients underlying the human- 
interest dynamics, enabling us to construct a phenomenological, 
random-walk based model. The model captures the essential features 
of the human-interest dynamics in that it is constructed based on 
generic ingredients extracted from real data, and it is capable of 
reproducing the scaling laws observed from data. The model, how- 
ever, may still be idealized as it cannot predict the scaling exponents. 
To develop a more predictive model, additional effects must be 
included, such as individual's memory effect^^'^^'^^, cognitive activ- 
ities^^'^^, and the specific web categories, etc. Nonetheless, the current 
model provides a phenomenological framework where the basic 
properties and scaling behaviors associated with human- interest 
dynamics can be explained. 

The fat-tailed distributions uncovered from data and the dynam- 
ical model developed accordingly can be applied to addressing 
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Figure 4 | Proposed model of human-interest dynamics and predicted scaling relations, (a) Schematic illustration of the model, where an individual can 
enter one of the two dynamically complementary states at each hopping step: exploring new interests with the probability pn~^ (the state of 
"Exploration", the white circles representing available new interests) or returning preferentially to a previously explored interest with the probability 1 — 
p/7~^ (the state of "Preferential return", the circles of different colors illustrating those visited interests, with the size corresponding to their frequencies to 
be visited by users). Regardless of which state takes place, as one interest is selected, an inertial effect was triggered, which can be modeled as an excited 
random walk (ERW)^^. (b, c) Fat-tailed distribution of P(/) and P(t), respectively, (d, e) Predicted interest-ranking distribution and transition-probability 
pattern, respectively. These results are obtained from model simulations where the number of agents in each case is 1000, for the parameter setting of 2 = 
0.4 and p = 0.6. For P(/), analytic result can be derived: P(/) ~ where and 1 — C are the probabilities of moving towards the "right" or the "left", 

respectively. In (b-d), three values of f are used: f = 0.4, = 0.5, and f = 0.6. In (e), the value of f is 0.5. 



significant problems ranging from human-behavior prediction and 
the design of search algorithms^"'^^ to controlling spreading dyna- 
mics^^'^^. As a demonstration, we have quantified the degree of pre- 
dictability of user-behavior patterns underlying the three data sets by 
using the statistical measures of entropy and Fano inequality^°, with 
the result that such patterns are in fact quite predictable, despite the 
apparent randomness in the human-interest dynamics (see Supple- 
mentary Information). 

Methods 

Data collection. The massive data sets used in this article are from large-scale real e- 
commerce and communication systems: Douban, Taobao, and MPR. For fair 
comparison, in each data set we focus on users who performed at least 100 actions. 
Data description and basic statistical properties are listed in Table I. 

(i) Douban. The experimental data set is randomly sampled from Douban, a major e- 
commerce company in China. It is similar to the Social Networking Services (SNS) 
that allows registered users to record information and create contents related to 
movies, books, and music, etc., yet it can also make personalized recommendations 
for the registered users. In this data set, we select 21,148 individuals, each executing at 
least 100 rating actions, from which we can find historical information about the 
users, such as user ID, item ID, rate, timestamps, and item types (considered as 
interest types), etc. The sampling time resolution is one second. 



(it) Taobao. The Chinese web site Taobao is one of the world's largest electronic 
marketplaces. The browsing behaviors of users on Taobao are recorded, and any user 
can browse and trade with any other users. Our data is composed of all browsing 
behaviors of 34,330 users, each browsing more than 100 items in the time span 
between September 1 and October 28, 2011. For each user, information is available 
such as the user ID, item ID, item classes (regarded as interest types), timestamps, etc. 
The sampling time resolution is one second. 

(in) MPR. a widely used electronic reading tool. The usage of such a mobile service 
reflects well customers' interests. We collected the reading records of 19,067 users, 
each performing more than 100 reading tasks between October 1 and October 31, 
2011. The categories of books that each reader chose are regarded as interests. The 
sampling time resolution is one day. 



Table 1 Basic parameters of the th 


ree massive 


data sets studied in 


this paper 








Data Sets 


#Users 


#Time-span 


Origins 


Douban 


21,148 


1 8 months 


This article 


Taobao 


34,330 


2 months 


This article 


MPR 


1 9,067 


1 month 


This article 
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Definition of length of interest interval 1. Previous studies defined session as a 
sequence of Web pages viewed by a user within a given time window, which has been 
widely used in modeling and tracking individuals' navigation behaviors^^'^*^"*^\ 
However, for characterizing human interest, this definition of session has two 
deficiencies: (1) difficulty to split an individual's click sequence into sessions*^" due to 
the continuous nature of the user online activities^" *^^, and (2) limit in the data sets, 
due to the time resolution of MPR (day). Thus, we define the interest duration / as the 
length of a sequence of clicks within the same interest category. 
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